100 research outputs found

    The Flexible Group Spatial Keyword Query

    Full text link
    We present a new class of service for location based social networks, called the Flexible Group Spatial Keyword Query, which enables a group of users to collectively find a point of interest (POI) that optimizes an aggregate cost function combining both spatial distances and keyword similarities. In addition, our query service allows users to consider the tradeoffs between obtaining a sub-optimal solution for the entire group and obtaining an optimimized solution but only for a subgroup. We propose algorithms to process three variants of the query: (i) the group nearest neighbor with keywords query, which finds a POI that optimizes the aggregate cost function for the whole group of size n, (ii) the subgroup nearest neighbor with keywords query, which finds the optimal subgroup and a POI that optimizes the aggregate cost function for a given subgroup size m (m <= n), and (iii) the multiple subgroup nearest neighbor with keywords query, which finds optimal subgroups and corresponding POIs for each of the subgroup sizes in the range [m, n]. We design query processing algorithms based on branch-and-bound and best-first paradigms. Finally, we provide theoretical bounds and conduct extensive experiments with two real datasets which verify the effectiveness and efficiency of the proposed algorithms.Comment: 12 page

    Bulk Insertions into xBR+ -trees

    Get PDF
    Bulk insertion refers to the process of updating an existing index by inserting a large batch of new data, treating the items of this batch as a whole and not by inserting these items one-by-one. Bulk insertion is related to bulk loading, which refers to the process of creating a non-existing index from scratch, when the dataset to be indexed is available beforehand. The xBR + -tree is a balanced, disk-resident, Quadtree-based index for point data, which is very efficient for processing spatial queries. In this paper, we present the first algorithm for bulk insertion into xBR+ -trees. This algorithm incorporates extensions of techniques that we have recently developed for bulk loading xBR+ -trees. Moreover, using real and artificial datasets of various cardinalities, we present an experimental comparison of this algorithm vs. inserting items one-by-one for updating xBR+ -trees, regarding performance (I/O and execution time) and the characteristics of the resulting trees. We also present experimental results regarding the query-processing efficiency of xBR+ -trees built by bulk insertions vs. xBR+ -trees built by inserting items one-by-one

    Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

    Full text link
    We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions

    Using metric space indexing for complete and efficient record linkage

    Get PDF
    Record linkage is the process of identifying records that refer to the same real-world entities in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their similarity. Linkage is usually performed in a three-step process: first, groups of similar candidate records are identified using indexing, then pairs within the same group are compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as locality sensitive hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity, or they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing (MSI) to perform complete linkage, resulting in a parameter-free process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. An evaluation on real-world data from several domains shows that linkage using MSI can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process.Postprin

    Neuregulin 1 and susceptibility to schizophrenia

    Get PDF
    To access full text version of this article. Please click on the hyperlink "View/Open" at the bottom of this pageThe cause of schizophrenia is unknown, but it has a significant genetic component. Pharmacologic studies, studies of gene expression in man, and studies of mouse mutants suggest involvement of glutamate and dopamine neurotransmitter systems. However, so far, strong association has not been found between schizophrenia and variants of the genes encoding components of these systems. Here, we report the results of a genomewide scan of schizophrenia families in Iceland; these results support previous work, done in five populations, showing that schizophrenia maps to chromosome 8p. Extensive fine-mapping of the 8p locus and haplotype-association analysis, supplemented by a transmission/disequilibrium test, identifies neuregulin 1 (NRG1) as a candidate gene for schizophrenia. NRG1 is expressed at central nervous system synapses and has a clear role in the expression and activation of neurotransmitter receptors, including glutamate receptors. Mutant mice heterozygous for either NRG1 or its receptor, ErbB4, show a behavioral phenotype that overlaps with mouse models for schizophrenia. Furthermore, NRG1 hypomorphs have fewer functional NMDA receptors than wild-type mice. We also demonstrate that the behavioral phenotypes of the NRG1 hypomorphs are partially reversible with clozapine, an atypical antipsychotic drug used to treat schizophrenia

    Fourteen sequence variants that associate with multiple sclerosis discovered by meta-analysis informed by genetic correlations

    Get PDF
    To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked FilesA meta-analysis of publicly available summary statistics on multiple sclerosis combined with three Nordic multiple sclerosis cohorts (21,079 cases, 371,198 controls) revealed seven sequence variants associating with multiple sclerosis, not reported previously. Using polygenic risk scores based on public summary statistics of variants outside the major histocompatibility complex region we quantified genetic overlap between common autoimmune diseases in Icelanders and identified disease clusters characterized by autoantibody presence/absence. As multiple sclerosis-polygenic risk scores captures the risk of primary biliary cirrhosis and vice versa (P = 1.6 x 10(-7), 4.3 x 10(-9)) we used primary biliary cirrhosis as a proxy-phenotype for multiple sclerosis, the idea being that variants conferring risk of primary biliary cirrhosis have a prior probability of conferring risk of multiple sclerosis. We tested 255 variants forming the primary biliary cirrhosis-polygenic risk score and found seven multiple sclerosis-associating variants not correlated with any previously established multiple sclerosis variants. Most of the variants discovered are close to or within immune-related genes. One is a low-frequency missense variant in TYK2, another is a missense variant in MTHFR that reduces the function of the encoded enzyme affecting methionine metabolism, reported to be dysregulated in multiple sclerosis brain.Swedish Research Council Knut and Alice Wallenberg Foundation AFA Foundation Swedish Brain Foundatio

    Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    Get PDF
    We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts
    • …
    corecore